arXiv: 1508.05381 vl [q-bio.QM] 21 Aug 2015 


FRUIT FLIES AND MODULI: INTERACTIONS 
BETWEEN BIOLOGY AND MATHEMATICS 

EZRA MILLER 


Possibilities for using geometry and topology to analyze statistical problems in biol¬ 
ogy raise a host of novel questions in geometry, probability, algebra, and combinatorics 
that demonstrate the power of biology to influence the future of pure mathematics. This 
is a tour through some biological explorations and their mathematical ramihcations. 

A biological hypothesis. Evolution sometimes results in discrete morphological dif¬ 
ferences among populations that diverge from a common source. This “saltation” can 
occur with features quantihed by integers—limbs, segments, petals, teeth, or digits 
(humans are occasionally born with six hngers); or quantihed by other discrete in¬ 
variants, such as tesselation patterns—seeds in howers or protomers in virus capsids. 
Biology has explanations of how populations that already exhibit a varying trait can 
lead to populations in which one or the other dominates. The question here is: what 
mechanism generates topological variation in sufficient quantity for selection to act? 

Take the fruit hy, for example. The normal Drosophila melanogaster wing depicted 
on the left differs from the abnormal other two in topology as well as geometry. 



Indeed, mathematically the veins in each wing can be abstracted as an embedded 
planar graph, with a location for each vertex and a contour for each arc. The graph 
in the middle has an extra edge, and hence two extra vertices, while the graph on the 
right is lacking a vertex. These topological variants, along with many others, occur in 
natural D. melanogaster populations, but rarely. On the other hand, different species 
of Drosophila exhibit a range of wing vein topologies. How did that come to be? Wing 
veins serve several key purposes, as structural supports as well as conduits for airways, 
nerves, and blood cells, among other things [8]. Is it possible that some force causes 
aberrant vein topologies to occur more frequently than would otherwise be expected 
in a natural population—frequently enough for evolutionary processes to act? 

Results from biologist Kenneth Weber, and later with more power by David Houle’s 
lab, show that selecting for continuous wing deformations results in skews toward 
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deformed wings with normal vein topology [39, 40, 20], bnt also—nnexpectedly—mnch 
higher rates of topological novelty. This latter claim, which is nnpnblished and has 
yet to be tested statistically, suggests a fundamental biological hypothesis: topological 
novelty arises at the extreme of selection for continuous shape characteristics. 

Wings to modules. This hypothesis could potentially be tested using persistent ho¬ 
mology, a tool for data analysis that uses computational topology to assign modules 
over polynomial rings to subsets X C M"- [12]. This tool is a good candidate be¬ 
cause of its ability to emphasize differences in stratihcation among otherwise similar 
subsets of M”. 

Take our case an of embedded graph X C M^, for example. For any nonnegative 
real numbers r and s, let X® C X be the set of points at distance at least r from every 
vertex and within s of some edge. Thus X^ is obtained by taking the union of the balls 
of radius r around the vertices away from the union of s-neighborhoods of the edges. 
In the following magnihed portion of the middle wing, r is approximately twice s: 



The homology Hi[X^) with coefficients in a held k counts connected components or 
loops of X^, when i = 0 or 1, respectively. Introducing a new vertex to an edge of X 
tends to create connected components and destroy cycles when r S> s, because the 
balls around vertices protect them from the expanding edges. The precise relations 
between r and s that alter the topology of X® depend on the geometry of X, such as 
the angles between edges at a given vertex, and that is the point: persistent homology 
uses topological invariants as measures of geometry. 

To keep the data structure hnite, the parameters r and s can, without loss of signif¬ 
icant information, be restricted to integer multiples of a small positive length e. The 
persistent homology of the graph X with the two parameters r and s is then dehned 
to be the direct sum Mi{X) = ^ Hi{X^) of all of the homology groups. It is a 

bigraded module over the polynomial ring k[a:, y]: the variables x and y act on Mi{X) 
by comparing the homology of X^ with that of X^_^ and X^+^, respectively. 

Persistent homology with only one parameter [37, 15], instead of two or more, results 
in a module over a polynomial ring in one variable. This case is much better studied, 
in part because it behaves more tamely. In particular, there is a hnite, computable set 
of topological features—connected components, loops or, in the general case, features 
of higher dimension—each of which has well dehned parameters where its “birth” and 
“death” occur, such that every homology class is a direct sum of these features [14]. For 
hy wings, or arbitrary multiparameter situations, where the homology groups record 
the topology of several increasing chains of subsets of a single topological space, no such 
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clean description is possible [12]. However, alternative presentations of modnles from 
combinatorial commntative algebra [31] (or see [33, Chapter 11]), based essentially on 
the theory of primary decomposition, can be nnderstood topologically in terms of birth 
and death parameters. Snch nnderstanding is necessary if statistics on sets of bigraded 
modnles are to be interpretable biologically. 

Moduli as statistical sample spaces. Persistent homology snmmarizes a sample of 
fly wings by transforming it into a sample of modnles. It is a general principle of sta¬ 
tistics that to analyze samples from a set of objects one needs snfficient nnderstanding 
of the set of all objects from metric, probabilistic, and sometimes combinatorial per¬ 
spectives. How far apart are pairs of objects? How likely is each object to be selected 
at random? Mathematics excels at placing coherent strnctnres on sets of all objects 
of a given type. The resnlting “modnli spaces” pervade geometry of many sorts— 
differential, algebraic, arithmetic, complex, discrete—and also theoretical physics and 
topology, thongh in the latter held they are called classifying spaces. Bnt despite their 
nbiqnity and in some cases onr snbstantial nnderstanding of their geometry and com¬ 
binatorics, less is known abont the probability and statistics of sampling from them. 

Like many modnli spaces, the ones parametrizing bigraded modnles over k[a:, y] are 
qnotients of algebraic varieties by continnons gronp actions [12]. This makes the modnli 
spaces complicated nnions of manifolds of varying dimension. One possibility, covered 
in the next section, is to develop geometric methods to analyze samples from snch 
“stratihed spaces”. Another, which tends to be favored a priori for compntational 
reasons, is to nse discrete invariants as proxies for the continnons modnli. For bigraded 
modnles, these discrete invariants inclnde 

• single-parameter persistence by tracing zigzags throngh the gronps Hi{X^) [11]; 

• Hilbert series, meaning the dimensions of the vector spaces Hi{X^), disregarding 
all of the homomorphisms between them; 

• rank invariants, which take into acconnt the ranks of the homomorphisms bnt 
not their precise algebraic strnctnre [12, 10]; 

• Betti nnmbers, which record discrete homological invariants of the modnle [29]. 

Any discrete invariant snbdivides the modnli space into regions where the invariant is 
constant. Understanding the natnre of these snbdivisions is a pragmatic matter for bi¬ 
graded k[x, i/]-modnles, given the fly wing context, bnt (modihcations of) some of these 
new sorts of qnestions make sense for any modnli space, or indeed any stratihed space. 

1. What metric or combinatorial properties do these snbdivisions possess? For 
example, do the regions have eqnal dimension and ronghly eqnal size, or are 
there a few big regions (or only one) of top dimension and bnnch of smaller ones? 

2. What distribntion of discrete invariants are expected from a given (biological) 
problem? Might the discrete invariants be expected to distingnish between the 
modnles prodnced by applied sitnations even if the regions aren’t of similar size? 
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3. General geometric statistical question: what (natural) measures should be placed 
on a set of discrete invariants, given the geometry of the moduli spaces? 

4. Can the continuous variation be captured discretely to desired precision? More 
precisely, is there a family (indexed by n) of sets of discrete invariants such that 
letting n —)■ cxo results in an increasingly hne subdivision? 

Geometric probability on stratified spaces. As we have seen for fly wings, statis¬ 
tical problems where the sample objects are more complicated than vectors in vector 
spaces naturally lead to sampling from stratihed spaces. The goal of geometric statis¬ 
tics in this setting is, like in ordinary linear statistics, to identify, describe, summarize, 
or make inferences about an unknown probability distribution on the sample space 
from which the sample points are assumed to be drawn. To that end, it is crucial to 
understand the opposite problem, from probability theory: given a distribution on the 
relevant sample space, how do samples from that distribution behave? 

The simplest summary of a distribution is a point—an average or population mean — 
about which the distribution is centered. Laws of large numbers assert that means of 
increasingly large random samples from a distribution converge to a population mean. 
Statistics requires knowledge of the expected difference between a sample mean and 
population mean. Central limit theorems help quantify that difference by describing 
the variation of sample means around population means. In ordinary statistics, for 
example, when the sample space is the real line, the central limit theorem dictates that 
sample means vary around the population mean according to a distribution that is, in 
the limit of inhnite sample size, Gaussian. 

In Euclidean space, basic concepts such as mean, expectation, and average coincide 
and therefore admit multiple equivalent characterizations, such as via least squares 
or arithmetic average. Already thinking about asymptotics of samples from smooth 
manifolds—let alone singular spaces such as the moduli spaces relevant to fly wings— 
requires a radical shift in perspective, as compared with samples from linear spaces, 
because different characterizations lift to different notions in curved settings [23]. More¬ 
over, for many of these notions, such as Frechet mean dehned by least squares, the 
minimizer is not unique: what is the average of the north pole and south pole on 
the sphere? It is the entire equator. (This explains the phrase “a population mean” 
in the previous paragraph, as opposed to “the population mean”.) Nonetheless, laws 
of large numbers hold [41, 5], and central limit theorems exist in various situations 
[26, 17, 18, 6, 22], such as when the data are concentrated near a Frechet mean. (Many 
of these theorems were motivated by biology; read the title of [22], for instance.) For 
additional background and references concerning statistics on manifolds, see [23]. 

Statistics on smooth manifolds relies on approximation of the manifold by its tan¬ 
gent space, which is Euclidean. Once a metric on the manifold has been specihed—a 
necessary and often nontrivial step for statistics, because of the need to know how far 
apart sample objects are—the exponential map at a point x takes a neighborhood U 
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of 0 in the tangent space to a neighborhood exp(f/) around x. Ordinary probability 
in the vector space is transformed into geometric probability on the smooth sample 
space via the exponential map at x, which is close to an isometry when U is small. In 
particular, central limit theorems on smooth manifolds can be interpreted as describing 
variation of sample means around a population mean by pushing forward the linear 
setup along an exponential map. 

In the singular setting of stratihed spaces, there is no general method to compare 
with or reduce to ordinary linear probability and statistics in the tangent space at a 
mean. The types of sample spaces M intended here are those possessing a topological 
stratification (see [16] or [35]): an expression as a disjoint union M = MoUMiU- ■ -UM^ 
of strata such that for all d G {0,..., r}, 

• the stratum is a manifold, 

• Mo U • • • U Md is closed in M, and 

• for every pair x, ?/ G there is a homeomorphism M ^ M that takes x to y 
and takes each stratum to itself. 

The third condition ensures that the topology of M and its stratihcation behaves 
precisely the same way near x as it does near y. Examples include graphs (or networks), 
whose strata are vertices and edges; polytopes, whose strata are (relatively open) faces; 
and real (semi)algebraic varieties, whose strata consist of classes of singular points. The 
tangent space to a stratihed space M at a point x is a cone over a stratihed space 
of dimension one less than dim(M). If M is already a cone with apex x, then Tx = M 
is as complicated as M itself; capturing the local structure of a stratihed space near a 
point need not simplify the geometry the way it does in the smooth setting. 

Cases where stratihed laws of large numbers and central limit theorems are known 
occur in specihc simple examples where comparison with linear spaces is possible, 
such as open books [19] (unions of Euclidean half-spaces glued along their boundary 
subspaces), isolated planar hyperbolic singularities (cones where the singular point has 
angle sum > 27r instead of the ice-cream case of < 27r) [24], and binary trees [4]. But 
in general, de novo geometric constructions are required. Such has been the avenue 
for the good deal of probability theory, including laws of large numbers, that has been 
established in the generality of nonpositively curved spaces (see [38]), which are dehned 
as spaces whose triangles with given edge lengths are thinner than would be expected 
from Euclidean geometry (see [9]). The metric structures of nonpositively curved spaces 
induce a number of simplifying consequences, such as uniqueness of Frechet means, 
which have played important roles in the progress thus far in geometric probability and 
statistics on stratihed spaces. Nonetheless, the promising interactions of nonpositive 
curvature with geometrically stratihed probability and statistics remain in their infancy. 

Sticky means. The novelty of attempting statistics on stratihed sample spaces is 
exemplihed by nonclassical “sticky” phenomena that can occur at singularities. In 
Euclidean statistics, the mean of a hnite set of points moves slightly in any desired 
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direction by perturbing the points. This intuition extends to manifolds, by linear 
approximation [26, 17, 6, 22], but it can fail even in the simplest singular sample 
spaces. Consider the tripod, for instance, depicted at left: 



In the center-left hgure, the Frechet mean p of the three points on the legs is the 
origin 0, by symmetry. But wiggling the three points, as in the center-right hgure, 
does not move the Frechet mean at all; one of the points would have to be moved more 
than twice as far from the center, as in the hnal hgure, to nudge the mean onto its leg. 

An open book with three pages is a product of tripod with vector space M'^. (To 
get an arbitrary number of pages, replace the tripod with a graph having an arbitrary 
number of rays emanating from the center point. It bears mentioning that every 
topologically stratihed space M is locally homeomorphic to an open book near any 
point on a stratum of dimension dim(M) — 1. In other words, the tangent cones to 
points on codimension 1 strata are open books. Hence this example is universal in 
some sense.) In an open book, the mean sticks to the spine—the copy of that is 
contained in all three pages—when three similarly situated points are wiggled, although 
that wiggling can move the mean in arbitrary directions inside of the spine [19]. 

With these examples and others in mind, a formal dehnition has been developed [24]: 
let Ad be a set of measures on a metric space K. Assume Ad has a given topology. A 
mean is a continuous assignment Ad —)■ {closed subsets of K}. A measure /i sticks to 
a closed subset (7 C AT if every neighborhood of /i in Ad contains a nonempty open 
subset consisting of measures whose mean sets are contained in C. 

Stickiness implies that it is possible for the means of large samples from a distribution 
on a stratihed space to he in a subset of low dimension, with positive probability 
(equal to 1 in some cases, such as the tripod), even if the distribution being sampled 
is well behaved [2, 4, 19, 24]. This contrasts with usual laws of large numbers, where 
sample means approach the population mean but almost surely never land on it—or 
on any given subset of low dimension containing it. Thinking in terms of central limit 
theorems, whereas in usual cases the limiting distributions have full support, in sticky 
cases the limiting distributions can have components supported on low-dimensional 
subsets of the sample space. 

Examples aside, central limit theorems on stratihed spaces of any generality have 
yet to be formulated, let alone proved. Subtle and deep behavior associated with the 
boundary between sticky and non-sticky are still being discovered. In particular, the 
distinction between positive and negative curvature seems to be critical for stickiness. 
One common type of positive curvature, particularly in statistical problems, appears 
when a hat or positively curved smooth manifold is quotiented by a proper Lie group 
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action. Shape spaces (see [27, 28])—including those applicable to fly wings with con¬ 
stant topology by keeping track of the locations of the vertices of the graph—have this 
form, for instance, being quotients of matrix spaces by actions of rotations, scaling, or 
projective transformations. Huckemann [23] has shown essentially that when sampling 
from a stratified space that is a quotient of this form, Frechet means run away from 
singularities and hence land almost surely in the smooth locus. On the other hand, 
every case exhibiting stickiness has negative curvature (in the sense of Alexandrov: cur¬ 
vature bounded strictly above by 0; see [9]). It remains an open problem to formulate 
a condition, in terms of something like negative sectional curvature, that allows means 
to run toward singularities of the space for appropriate types of distributions on it. 

Further potential implications for pure mathematics emerge when considering how 
to recover curvature invariants from asymptotics instead of the other way around. In 
the smooth case, for local samples—that is, sufficiently nearby the mean—accounting 
for curvature reduces central limit theorems to the Euclidean version. Thus, if the 
curvature is unknown but properties of the distribution are known, then the curvature 
ought to be recoverable. In singular settings, attempting such a recovery could give 
rise to singular analogues of smooth Riemannian curvature invariants. 

Phylogenetic trees. Moduli spaces of fly wing modules constitute just one of myriad 
ways that geometric probability on stratified spaces can arise. Among those, nothing is 
special about biology, except perhaps that its diversity of forms and the nature of their 
variation lend themselves to geometric data analysis of this sort. That said, the genesis 
of stratified statistics came directly from another evolutionary biology moduli space. 

One of the principal aims of systematic and evolutionary biology is to determine re¬ 
lationships between species. Trees representing these kinships are reconstructed from 
biological data such as DNA sequences or morphology. Experimental procedures gen¬ 
erate distributions on the set of phylogenetic trees in multiple ways. For example, 
the evolutionary history of a single gene across multiple individuals is represented by 
a “gene tree”. Natural processes such as incomplete lineage sorting cause gene trees 
sampled from a set of individuals to differ in topology from one another and from the 
evolutionary history of their set of species—the history of population bifurcations lead¬ 
ing to divergence (see [30], for example). Another crucial example occurs not from the 
data but from the method of inference: the problem of phylogenetic tree reconstruc¬ 
tion is intractable enough that probabilistic methods are commonly used, resulting in 
posterior distributions instead of a single optimum [36, 25]. 

Mathematically speaking, a phylogenetic tree on a given set of species is a connected 
metric graph, with no loops, whose vertices of degree 1 (“leaves”) are labeled by the 
species. The introduction by Billera, Holmes, and Vogtmann of an appropriate moduli 
space for the problem, namely the space of phylogenetic trees [7], initiated a surge of 
activity attempting to mine the combinatorics and geometry of the space to devise 
statistical methods. And the combinatorics is formidable: for n species, the tree space 
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7^ is a polyhedral stratified space composed of {2n — 3)!! Euclidean orthants of dimen¬ 
sion n — 2. Despite its complexity, Tn succumbed: the advent of a polynomial time 
algorithm for shortest paths in tree space [34] made it possible to compute Frechet 
means in Tn [1, 32] based on probability theory for nonpositively curved spaces [38]. 

The geometric probability on stratified spaces in the previous two sections was ini¬ 
tially developed specifically to understand the behavior of Frechet means in the moduli 
space Tn- The simple examples [19, 24] deal with informative local subsets of Tn- In 
addition to those, efforts are underway to prove laws of large numbers and central limit 
theorems in the global context of Tn itself [2, 3]. 

Stickiness in tree space has a concrete, meaningful biological interpretation (although 
the jury is out on the extent to which the interpretation reflects reality). Points in strata 
of lower dimension represent phylogenetic trees with one or more non-binary vertices. 
Biologically, these are unresolved phylogenies: one species diverges simultaneously into 
three new species, for example, instead of first splitting into two new species followed 
by another binary divergence event from one of the two. Sets of phylogenetic trees from 
biological experiments often contain evidence for many or all of the possible sequences 
of binary divergence events that resolve a given multiple divergence. Stickiness implies 
that the mean tree will contain an unresolved divergence whenever there is insufficient 
strength of pull toward any one of the resolving binary sequences to support the con¬ 
clusion it represents. The picture to keep in mind is the tripod, whose stickiness we 
saw earlier: it is the tree space T on three species. 

Conclusion. Spaces of biological forms provide an environment in which mathemati¬ 
cal methods can assign distances between phenotypes. The lines of inquiry here fit into 
biologist David Houle’s vision of phenomics [21], particularly the genotype-phenotype 
map. To make a long story short, selection acts on phenotype whereas descent and 
modification act via genotype, so it becomes desirable to compare phenotypic distance 
to genotypic distance, including some working definition of each distance in any given 
case study. In general, to grapple with statistical correlations between genotypes and 
phenotypes requires a mathematical parametrization for each of those biological con¬ 
cepts. For genotypes, that is likely to involve combinatorial considerations, since the 
basic quantum of information is discrete. Perhaps large-scale continuous analogues or 
approximations might be meaningful, as they are for statistical mechanics, and that is 
another potential departure point for mathematics. On the other hand, phenotype is 
often continuous in nature: what are the locations of the vertices in a fly wing? How 
do the arcs between the vertices bow outward or inward? How do these characteristics 
change from wing to wing? Parametrization in such a context requires thinking about 
spaces of continuous objects, the sort of thinking that mathematics is designed to 
carry out. The examples presented here demonstrate a sample of the kinds of abstract 
structures in pure mathematics—along with unexpected questions about them—that 
biological investigations reveal. 
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